vectorscale: pack-positions pre-pass + geometric crossing intersection by northbymidwest · Pull Request #909 · libretro/slang-shaders

northbymidwest · 2026-04-30T19:57:00Z

This PR has two parts: a per-fragment performance optimization (the pack-positions pre-pass) and a correctness improvement for crossings (geometric curve-curve intersection). Both touch only the existing pipeline — no changes to similarity-graph, resolve-crossings, cell-graph, init-positions, or optimize-energy.

Performance — `pack-positions` pre-pass

New shader inserted between the final update-tjunction iteration and cell-rasterizer. For each CP slot, it packs the full per-CP render geometry into 3 horizontally-adjacent texels of a PackedPositions framebuffer:

(cx*3 + 0, cy_slot) = (pp.x, pp.y, prev_ci_or_-1, _)
(cx*3 + 1, cy_slot) = (cp.x, cp.y, t_branch, validity)   validity: 0=skip, 1=normal, 2=2-CP-line
(cx*3 + 2, cy_slot) = (np.x, np.y, next_ci_or_-1, _)

(pp, cp, np) is already ghost-extended (pp = 2·prev − cp etc.) for endpoint neighbors. t_branch is computed in the right way per CP type (see correctness section below for crossings; closed-form cubic project for one-sided clamped Beziers; 0.5 otherwise).

What this lets the rasterizer's test_one_cp skip per pixel:

The 2 neighbor-index fetches from CellGraph rows 1, 2 — neighbors come back inline in B channels of texels 0/2.
The 2 follow-up FinalPositions fetches for prev/next.
The follow-up flag fetch on the neighbor needed for the IS_ENDPOINT check that drives ghost extension.
The in-shader ghost-construction math, 2-CP-chain branching, and t_branch cubic-solver call.

Cost added: 3 fetches from PackedPositions per CP probe instead of 1 from FinalPositions. Net per active probe: ~6 fetches → 4 (1 flag + 3 packed reads). The pack-positions pass itself runs once-per-frame, O(num_cps) work — 3 fragments per CP slot.

Measured ~5–10% frame-time reduction on dense frames (sprites with many active CPs / large viewport scale where the rasterizer is the dominant cost). Sparser frames see less; fewer active CPs means fewer per-pixel probes get past the flag check and the existing 3-sample distance² quick screen, so the rasterizer spends proportionally less of its time in the body that pack-positions actually shortens.

resolve_hit's neighbor flag/dir lookups for color resolution are unchanged — neighbor indices still flow through to it via the packed texels' B channels.

Correctness — respect the optimizer's crossing positions

At a 4-way crossing the rasterizer's wedge-AA junction lines need to anchor at the geometric meeting point of the N-S and E-W B-spline curves. The previous approach didn't compute that meeting point directly: update-tjunction ran a ghost-aware inverse B-spline correction that relocated each crossing CP to the position that would make the rendered curve pass through the grid corner at exactly t=0.5. That overrode the position the optimizer had chosen — the CP got pulled away from its energy-minimum back toward the integer grid corner so the rasterizer's downstream t_branch=0.5 assumption worked out.

This PR keeps the optimizer's crossing position and computes the actual curve-curve intersection inline in pack-positions:

Solve F(t, s) = B_a(t) − B_b(s) = 0 by 2D Newton iteration starting from (t, s) = (0.5, 0.5). The optimizer keeps crossings near the grid corner so the initial guess is within ~0.1 of the answer; quadratic convergence gets the residual below f32 epsilon in 3 iterations (4 used for safety). Each step inverts the 2×2 Jacobian analytically; an early break on |det(J)| < 1e-12 handles the tangent / parallel-curves degenerate case (doesn't fire in practice).
Slot 0 of the crossing pair gets t_a (parameter on the N-S curve), slot 1 gets t_b (E-W). The CP itself stays at the optimizer's final position — no relocation.
The rasterizer reads t_branch straight from PackedPositions and uses it as the wedge-AA junction parameter, so J = beval(curve, t_branch) lands on the geometric intersection at whatever t the two curves actually cross.

update-tjunction.slang loses the IS_CROSSING branch entirely; only T-junction stem snap remains. The Opt2 sampler / read_orig_pos helper / Opt2Size UBO field are gone (no longer read).

Pipeline diff

11 passes (was 10):

similarity-graph
resolve-crossings
cell-graph
init-positions
optimize-energy ×2
update-tjunction ×3
+ pack-positions      ← new
cell-rasterizer

vectorscale.slangp updated accordingly.

Verification

All 8 shaders compile cleanly to SPIR-V via glslangValidator.
Visual smoke test in RetroArch (Vulkan backend) on Game Boy and pixel-art content shows no regressions vs the previous version; crossing CPs now sit where the optimizer wanted them.

Co-authored-with @anthropic-ai/claude-code.

Adds a per-CP pre-pass (pack-positions) that denormalizes render geometry into a single PackedPositions texture and folds the crossing curve-curve intersection into the same pass. The rasterizer reads its full per-CP geometry from PackedPositions and skips ghost extension, neighbor-index decoding, and t_branch solving in its hot loop. New shader: pack-positions.slang For each CP slot, packs into 3 horizontally-adjacent texels: col 0 = (pp.x, pp.y, prev_ci_or_-1, _) col 1 = (cp.x, cp.y, t_branch, validity 0=skip 1=normal 2=line) col 2 = (np.x, np.y, next_ci_or_-1, _) (pp, cp, np) is the ghost-extended (pp = 2·prev - cp etc.) Bezier control triple. t_branch is computed per CP type: - IS_CROSSING: 2D Newton iteration on F(t,s) = B_a(t) - B_b(s) = 0, starting from (0.5, 0.5). The optimizer keeps crossings near the grid corner so the initial guess is within ~0.1 of the answer; 4 iterations drive the residual below f32 epsilon. Reads neighbor positions from both this slot's chain (N-S or E-W) and the partner slot's chain. This replaces the legacy ghost-aware inverse-correction that moved each crossing CP so the rendered curve passed through the grid corner at t=0.5. The CP now stays at its optimizer-final position and the rasterizer's wedge AA anchors at the geometric intersection B_a(t) = B_b(s). - 2-CP chain (degenerate stem with both ends as endpoint markers): t_branch = 0.5; render geometry pre-built as a straight line so the rasterizer dispatches to its closed-form line solver via is_line. - One-sided clamped Bezier (prev or next is endpoint): closed-form cubic project of the interior B-spline midpoint onto the clamped span — finds the t at which the rendered clamped curve reaches the same physical "before/after sc" boundary an interior B-spline would at t=0.5. - Else: t_branch = 0.5. Modified: update-tjunction.slang Drop the IS_CROSSING ghost-aware inverse-correction branch; crossings pass through unchanged. Drops the now-unused Opt2 sampler binding, read_orig_pos helper, and Opt2Size UBO field. Modified: cell-rasterizer.slang Replace read_pos + read_neighbors + ghost extension + 2-CP-chain construction + t_branch cubic-solver in test_one_cp with a single read_packed_cp(ci) call returning a PackedCp struct. Per-active-probe fetch count: ~6 → 4 (1 flag + 3 packed reads). resolve_hit's neighbor-direction lookups for color resolution are unchanged. Modified: vectorscale.slangp 11 passes (was 10). pack-positions inserted between the final update-tjunction iteration (FinalPositions) and cell-rasterizer. PackedPositions framebuffer is 3.0 × source-relative wide.

northbymidwest marked this pull request as draft April 30, 2026 19:59

northbymidwest marked this pull request as ready for review April 30, 2026 20:28

northbymidwest marked this pull request as draft April 30, 2026 22:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vectorscale: pack-positions pre-pass + geometric crossing intersection#909

vectorscale: pack-positions pre-pass + geometric crossing intersection#909
northbymidwest wants to merge 1 commit intolibretro:masterfrom
northbymidwest:vectorscale-pack-positions

northbymidwest commented Apr 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

northbymidwest commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance — pack-positions pre-pass

Correctness — respect the optimizer's crossing positions

Pipeline diff

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

northbymidwest commented Apr 30, 2026 •

edited

Loading

Performance — `pack-positions` pre-pass